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ABSTRACT 

This report provides information on test development, 
test administration, and score interpretation for the Graduate 
Management Admission Test (GMAT). The GMAT, first administered in 
1954, provides objective measures of an applicant's abilities for use 
in admissions decisions by graduate management schools. It is 
currently composed of five sections: (1) Reading Comprehension; '2) 
Problem Solving; (3) Practical Judgment; (4) Data Sufficiency; and 
(5) Usage. New test forms are developed systematically from a test 
item pool. New items are pretested at regular test administrations 
and evaluated empirically. Test specifications are the blueprints for 
assembling a final test form. Uniform test administration conditions 
are maintained through responsible supervision and careful test 
security procedures. In addition, several publications provide all 
examinees with GMAT test information and test taking strategies. The 
GMAT score scale has a mean of 500, a standard deviation of 100 for 
the base group, and a possible range from 200 to 800. Augoff's 
methods are used to equate scores. Continuing validity studies 
(predictive, content, and construct) provide an important basis for 
score interpretation information on reliability, standard error, 
descriptive statistics, and biographical data for examinees are also 
given to assist in score interpretation. Appendices contain sample 
GMAT test items and methods for calculating reliability coefficients 
and equating parameters. (BS) 
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THE GRADUATE MANAGEMENT ADMISSION TEST: 
Technical Report on Test Development and Score Interpretation For QMAT Users 



L INTRODUCTION 

The GMAT was first administered in February 1954 to 
about 1,300 prospective students of graduate schools 
of business. Less than a year eartier, in March 1953, a 
conference at which 12 graduate schools of business 
were represented had agreed that a nationwide testing 
program in this area would be usefui. There foiiowed a 
period of vigorous activity. Two meetings of a policy 
committee formed to guide the new program were held. 
An important focus of this effort was the identification 
of suitable abilities to be measured by the test. All nec- 
essary steps for scoring, publicizing, developing, and 
administering the new test were worked out by ETS 
with the advice and approval of the Policy Committee. 
The test was called the Admission Test for Graduate 
Study in Business until 1976. In that year the test name 
was changed to Graduate Management Admission 
Test. 

From the outset the program was guided by repre- 
sentatives of participating schools. The test, which 
was the focus of the program, was prepared by ETS 
test development staff members and was adminis- 
tered, under secure conditions, throughout the United 
States and m a number of foreign cuies. Scoring, re- 
porting, and various statistical services designed to aid 
in test development and score interpretation were pro- 
vided. Finally, research aimed at improving program 
effectiveness was identified as an integral part of pro- 
gram activities. As the program developed over the 
years, services relevant to admission but not directly 
concerned with testing were Initiated, This report, 
however, will be limited to matters directly related to 
the GMAT. 

The incorporation, in 1970, of the Graduate Business 
Admission Council (now the Graduate Management 
Admission Council) defined explicitly the role of the 
Council with respect to the test and other program ac- 
tivities. The Council, which consists of representatives 
of 54 graduate schools of management, is both a ser- 
vice organization and a professional organization. As a 
service organization it seeks to improve the selection 
process for graduate management schools by develop- 
ing and administering appropriate testing instruments, 
and informing schools and students as to the appropri- 
ate use of such instruments and other materials related 
to the selection process. In addition, it serves as a 
mediurr. of information exchange between students 
and schools. As a professional organization it serves 
as a forum for interchange of ideas and information. 
The Council sponsors the OMAT; ETS consults with the 
Council on all matters of general policy affecting pro- 
gram activities that it conducts for the Council. 



Purpose of QMAT 

The purpose of the GMAT is to provide objective mea* 
sures of an applicant's abilities for use by graduate 
management schools as one consideration in making 
admissions decisions. In order to make the test as use« 
ful as possible for this purpose, the test must measure 
abilities that are relevant to successful performance in 
graduate management school and that are developed 
by a wide range of educational experiences, it must be 
sufficiently long to provide a reasonably dependable 
measure, it must be administered under uniform, se- 
cure conditions, and it must be scored accurately. Fi- 
nally, scores must be reported promptly in a conve- 
nient form and accompanied by materials to aid in their 
use. When these conditions are fulfilled, the test 
scores may be relied upon by admissions officers to 
supplement other data about applicants, particularly 
F^revious academic performance. 



Evolution of Test Composition 

The composition of the test with respect to the abilities 
measured and the relative weight given to each ability 
a^e the characteristics that define a particular test. 

In planning the original 1954 form of the test, it 
seemed clear that both verbal and quantitative abilities 
were important, and that roughly equal weight should 
be given to each. Tests of these abilities were consid* 
ered to be appropriate for students who had enrolled in 
different undergraduate programs. Tests of these abil- 
ities that had proved to be successful in other 
programs, that seemed appropriate on judgmental 
grounds, and that could be produced expeditiously 
were chosen for the 1954 test. The test consisted of 
four separately timed sections, as follows: 

I. Verbal (25 minutes) 

II. Quantitative (65 minutes) 

III. Best Arnuments (30 minutes) 

IV. Quantitative Reading (55 minutes) 

Beginning in 1955, the Total test score was supple- 
mented by a Verbal part score, based on the Verbal and 
Best Arguments sections, and a Quantitative part 
score, based on the Quantitative section. Quantitative 
Reading items were not included in either part score. 
The new part scores provided users with information 
about an applicant's relative standing In verbal and 
quantitative ability. 

The composition of the test was changed in several 
ways beginning in November 1961. Three new item 
types. Organization of Ideas, Directed Memory (later 
called Reading Recall) and Data Sufficiency were intro- 
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duced, in part because research evidence indicated 
that they would increase the predictive effectiveness of 
the test (Pitcher, 1960). The Organization of Ideas sec- 
tion provided an objective measure of an examinee's 
ability to identify a logical structure within a set of 
statements. The Directed Memory section measured 
reading comprehension under conditions that pre- 
vented the examinee from referring back to the reading 
passages when answering questions based on the pas- 
sages. The Data Sufficiency item type was a measure 
of quantitative ability based on the examinee's ability 
to analyze a nmthematical problem without carrying 
out the actual solution. Quantitative Reading, Verbal, 
and Best Arguments were dropped from the test. Of the 
five parts, Quantitative and Data Sufficiency defined 
the Quantitative part score and the other three parts 
defined the Verbal part score. The test included the fol- 
lowing sections. 

I. Directed Memory (Reading Recall) (35 minutes) 
It. Quantitative (75 minutes) 

III. Organization of Ideas (20 minutes) 

IV. Data Sufficiency (15 minutes) 

V. Directed Memory (Reading Recall) (35 minutes) 

In November 1966, Organization of Ideas was re- 
placed by a 20-minute Verbal Omnibus section that in- 
cluded antonyms, analogies, and sentence completion 
items. Except for this change the basic structure of the 
test remained the same until 1972, when two 20-minute 
sections of Practical Business Judgment replaced 35 
minutes of the time allocated to Reading Recall. Prac- 
tical Business Judgmen tems were included in the 
Verbal part score. At t' same time, the Verbal Om- 
nibus section was ohoitened from 20 minutes to 15 
minutes. 

In 1976 several changes were introduced in the com- 
position of the test. Of the two sections that defined 
the Quantitative part score, the 75*minute Quantitative 
section that emphasized Data Interpretation items was 
replaced by a 40-minute Mathematics section that em- 
phasized problem solving items, and the time allotment 
for Data Sufficiency was increased from 15 to 30 min- 
utes. Several changes were made in the sections in- 
cluded in the Verbal part score. The 15-minute Verbal 
Omnibus section was replaced by a 15-minute Usage 
section. The new section called for the identification of 
errors in Standard Written English. It was introduced in 
recognition of the importance of written expression in 
management, a point that was highlighted in the find- 
ings of the Casserly and Campbell (1973) survey of 
skills and abilities needed by graduate students. A fur- 
ther change, introduced in 1977. substituted 30 min- 
utes of Reading Comprehension for tho 35 minutes of 
Reading Recall. It was judged that these item types 
measured very similar abilities, but that Reading Com- 
prehension would present fewer complications in test 
administration 

The changes introduced in 1976 and 1977 brought 
GMAT to Its present composition, which is as follows* 



I. Reading Comprehension (30 minutes) 

II. Problem Solving (40 minutes) 

III. Practical Judgment (40 minutes) 

IV. Data Sufficiency (30 minutes) 

V. Usage (15 minutes) 

A complete recent form of GMAT is published in the 
1979*80 Guide to Graduate Management Education. 
Appendix A of the present report gives sample items 
for each of the item types included in the test from 1954 
to the present time. 

This brief review of the evolution of the test suggests 
that changes have been gradual and relatively infre- 
quent. Thus, there is a strong continuity within the test 
over the years, a condition that is highly desirable if 
scores earned at different times are to be treated as 
comparable. 



IL DEVELOPING A NEW FORM OF GMAT 

In an ongoing testing program, a systematic plan fot* 
developing new forms of the test is essential, particu- 
larly to minimize the possibility that examinees will 
have an opportunity to anticipate some of the ques- 
tions included in the test. Ordinarily, a new form of the 
test should measure the same abilities and be at the 
same difficulty level as previous forms, but should be 
composed mainly or exclusively of items not included 
in any previous form. If changes are to be introduced 
between a new form and its predecessors, they should 
be made deliberately, not inadvertently, and gradually, 
not abruptly, in order to maintain comparability be- 
tween scores earned at different test administrations 
over a period of several years. This discussion will be 
based on the more typical situation in which the new 
form is designed to match earlier forms as closely as 
possible. 

Each new form of GMAT is composed of objective 
test items that call for the examinee to choose among 
five options of which only one is the best choice and is 
scored as correct. The first task in building a new form 
is to develop a supply of items that measure the abil- 
ities tested in the earlier forms and that are as free as 
possible of identifiable defects. It is also necessary to 
be able to compare the difficulty of new items in the 
pool with that of items included in previous tests. The 
new test form can then be matched with older teSt 
forms with respect to overall difficulty. 

Developing an Item Pool 

The indispensable first step m building an item pool is 
to create the first draft of an item. There are a number 
of item-writing rules, but their value is mamly in reduc- 
ing the proportion of items that are found later to be de- 
fective, items do need to be compatible with other 
items of the same type when used m a test, a fact that 
Item writers consider in developing a possible item. 
The actual production of items seems to depend main- 
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ly on a good understanding of the ability measured by a 
particular kind of item, perceptiveness in identifying 
tasks that will be suitable in difficulty for GMAT exam* 
inees, and fertility in devising incorrect responses that 
will attract the less able examinees. This is another 
way of saying that item writing is an art. 

Once the item nas been drafted, however, a series of 
formal processes can be used to remedy apparent de- 
fects, particularly in the way in which the Item is ex- 
pressed. Obviously, faulty grammar, awkardness of ex- 
pression, and inconsistencies in style can be corrected 
by competent editing. Review by one or two persons fa- 
miliar with the item type may identify items that may be 
ambiguous, especially to examinees who are very high 
in the ability measured, and items that may inadver- 
tently give clues, particularly to sophisticated test 
takers, concerning the correct answer. Finally, items 
are reviewed by persons who are sensitive to expres- 
sions that are objectionable to women or to minority 
groups; items are then revised to remove this kind of 
defect. 



Pretest Item Analysis: Purpose 

Items that survive the review processes are next pre- 
tesied at a regular test administration. Because items 
do not necessarily work in the way that their authors in- 
tended, it is important to make an empirical evaluation 
of each item before it is permitted to contribute to an 
examinee*s score. Pretesting also makes it possible to 
control the difficulty level of new test forms. These two 
purposes correspond to the two main kinds of statis- 
tical analyses applied to pretest data. First, the rela- 
tionship between examinees' performance on each 
item and their total scores on all items of that partic- 
ular type helps to identify items that need to be revised 
or, possibly, discarded. Second, an index of the diffi- 
culty of the itenns, when adjusted to take account of the 
ability level of the pretest group, is useful in controlling 
the level of difficulty of the test. 

The statistical analysis of each pretested item pro- 
vides infornration about the relationship between per- 
formance on the item and a score based on a set of 
similar items in three different ways: 

(a) The biserial correlation coefficient between the item 
and the total score on items of the same type, 

(b) The mean score on items of the same type for exam- 
inees choosing each of the five options and for stu- 
dents who omit the item; and 

(C *or students who rank in each fifth on the total 
scores, the number choosing each option or omit- 
ting the Item. 

Essentially, the biserial correlation coefficient is an 
objective index of the extent to which the examinees 
who answered the ite n correctly differ in average score 
frorr^ the remainder of the group. The biserial correla- 
tion coefficient, which is used for item analysis work at 



ETS, adjusts the result to take account of the percent- 
age of students who give the correct answer.* This ad- 
justment Is considered to make the resulting correla- 
tion coefficients more nearly comparable for items at 
different difficulty levels. Experience in using item 
analysis results has indicated that items that have a 
correlation below .30 need to be reviewed with special 
care to try to find out why examinees who earn high 
total scores on the set of items do not perform appreci- 
ably better on the item than do examinees who earn 
low total scores. 

Pretest Item Analysis: An Example 

The detailed steps involved in using pretest data may 
be illustrated by discussing the example shown in 
Figure 1. 

Information comparing the test score of examinees 
who give the correct answer with those who choose 
one of the incorrect answers receives special attention. 
If too many high-scoring students choose a wrong an- 
swer on the item, it often happens ihat the question is 
open to niisinterpretation. The statistical results are in- 
tended only to supplement the search for flaws in the 
item based on a thoughtful scrutiny of the item by re- 
viewers. 

The biserial correlation coefficient for the sample 
item, shown in the lower right hand corner of the print- 
out, was .54 for the group tested. This is well above the 
**danger-point'' figure of .30. Thus, it is unlikely that the 
more detailed statistical results available for each op- 
tion will reveal serious flaws in the item. 

A second way of looking at the results is to consider 
the average test score (expressed on a scale to be de- 
scribed later) of those choosing each option. This anal- 
ysis makes it possible to spot any wrong answer that 
seems to be attracting too many above-average stu- 
dents. On the sample item, the average total score on 
quantitative items for examinees who chose the cor- 
rect option (designated by an asterisk) is 16.3; the high- 
est average for an incorrect option is 12.5. The average 
for all 2,000 examinees is 13.0. 

One feature of the item analysis procedure designed 
for the convenience of those who use the results ii* that 
the average total score is always set at 13.0 for the 
group on which the item analysis is based. The stan- 
dard deviation of Total scores for the total item anal- 
ysis group is always set at 4.0. Thus, it is easy to tell 
whether the examinees choosing a particular option 
are above or below the average of tho total group, and 
by how much. The use of a uniform scale for the test 
score enables persons who work with item analysis re- 
sults to get some idea of how well an item is working by 
looking at the pattern of average total scores for the 
various options. 

Along with the average test scores, the *DOttom sec- 
tion of the printout also shows how many examinees 



*The biserial coefficient is described more fully m statistical lexibooks 
(e.g .Guilford and Fruchtef. i973). 
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Figure 1. Item analysis results lor a sample Item. 

A man has exactly enough fencing to enclose a rectangular region 3 times as long as It is wide. He discovers that If he uses the 
same amount of fencing to enclose a square region, he can enclose 225 additional square feet. How many feet of fencing does he 

(A) 30 (B) 120 (C) 150 (D) 675 (E) 900 
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chose each option. Of course, the item writer attempts 
to create wrong answers that will be attractive to exam- 
mees and thus to sharpen the differentiation between 
able and less able students on the ability measured by 
the item. The extent to which all options are attracting 
reasonable numbers of examinees tells the item writer 
how successful he or she has been in formulating ef- 
fective wrong answers. 

Besides the average score for each of the five possi- 
ble responses, the printout also shows the average 
tola! score for examinees who omitted the item. An 
Item is considered to be an **omit'' only if the examinee 
has answered a subsequent item in the separately- 
timed part of the test under consideration. This defini- 
tion attempts to distinguish between '*omits*' (i e.. 
Items that are considered but not answered) and items 
at the end of the test that the examinees may not even 
have read. In Figure 1, the group of examinees who 
omitted the sample item is fairly large and their aver- 
age total score is slightly higher (13.1 vs 13.0) thai that 
for the entire item analysis group. However, because 
the mean score of examinees who reached the item is 
13.4. those who omitted it have a slightly lower score 
than all examinees who reached it. For this item, a con- 
siderable proportion of examinees at all five ability 
levels omitted it. On the whole, the proportion^ omits 
on this Item is larger than would be ideal, but *s not suf- 
ficiently large to warrant revising it. 



Finally, the numbers in the upper portion of the print- 
out provide still greater detail on the relation between 
total score and responses. The right-hand column 
(headed "High Nj'*) shows the number of examinees in 
the top fifth on total score who gave each response, 
and the other four columns show the number of exam- 
inees in successively lower fifths who gave each re- 
sponse. It will be noted that the nurpbers for the correct 
response (B) increase consistently from 25 in the bot- 
tom fifth to 172 in the top fifth and that the numbers for 
the most popular incorrect response (D) show a down- 
ward trend. The figure for "TOTAL'* shows the number 
who answered or omitted the item. Because there are 
exactly 400 examinees in each of the five groups, the 
difference between the figure reported for **TOTAL" 
and 400 shows how many did not reach this item. For 
the top fifth only 44 did not reach it; for the bottom fifth, 
143 did not reach it. 

A preliminary idea of how difficult an item is can be 
obtained by dividing the number of examinees who an- 
swered it correctly by the number of examinees who 
reached it. For the sample item, this figure (P.) turned 
out to be .23. The figure in the box labeled "IVItotal" 
shows the average score on the total test for the exam- 
inees who reached the sample item. Because this aver- 
age is 13.4. we may conclude that the examinees who 
reached this item were more able than the rest of the 
Item analysis group. We need an estimate ot how diffi- 
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cult the item would be for the total Item analysis group. 
The measure of difficulty used for item analysis, called 
delta (A), provides such an estimate. 

Delta is so defined that a higher value of delta means 
a more difficult item and thud a smaller percentage of 
examinees in the item analysis group who would an* 
swer it correctly. It also assumes that a given change in 
the delta value would have a greater effect on propor- 
tion correct for items in the middle of the difficulty 
range than for those at the extremes. The following 
table shows the proportion of the item analysis group 
who would be expected to give the correct answer for 
items having selected values of delta; 





Proportion 


A Value 


Correct 


17 


.16 


15 


.31 


13 


.50 


11 


.69 


9 


.84 


7 


.93 



The delta scale for item difficulties is defined in 
terms of a normal curve having a mean of 13 and a stan- 
dard deviation of 4. Then the percentage of the no^'mal 
curve above a particular difficulty value is equal to the 
percentage of members of the item analysis group who 
answered the item correctly. Figure 2 illustrates this 
point. 

Because test construction often requires precise 
measurements of difficulty level, it is necessary to take 
account of the fact that different item analysis groups 
differ from each other in ability level. A relatively sim- 
ple way of adjusting for the difference between two 
groups is applicable provided that sufficient items (us- 
ually 20 or more) have been administered to both 
groups. Each of these common items will have two val- 
ues of delta—one for each group. When the pairs of 
deltas are plotted on ordinary graph paper, they gener- 
ally fall along a straight line. It is then possible to deter- 
mine a linear equation relating the two sets of deltas. 
The resulting equation can be applied to transform a 
delta obtained on one group to the corresponding delta 
for the other group. The process of equating item diffi- 
culties for a new pretest in continuing programs is fa- 
cilitated by the fact that any item that has previously 
been equated can be used in the set of 20 or more it^^s 
needed for equating the new items. ^ 

A delta value calculated for a particular item analysis 
group is called an observed (or raw) delta (Aq). Because 
item analysis groups vary in ability level, the observed 
delta for an item will be higher if it was administered to 
a less able group than if it had been administered to a 
more ab)e group. Observed deltas can be adjusted" sta* 
tisticaHy so that they represent the difficulty level that 
each Item would have had if it had been administered 
to a standard reference group. The adjusted delta for 
an item, called its equated delta (A^)* may be compared 
directly to equated deltas for other items, even though 
the Item analyses were based on different groups. 
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The sample item shown in Figure 1 has an observed 
delta of 16.4 and an equated delta of 14.2. Clearly, the 
item is more difficult for the group on which its item 
analysis was based than it would have been for the 
standard reference group to which equated deltas are 
referred. This result indicates that the group on which 
the item analysis was based was less able than the 
standard reference group. 

Assembling the Final Test Form 

If the items in the item pool are the building blocks 
i.om which a new test form is built, the test specifica- 
tions are the blueprint that guides the construction of a 
test form for operational \ise in the testing program. 
The specifications state the number of items of a par- 
ticular item type and the level and range of difficulty oi 
the items to be included in each separately-timed sec- 
tion. In the usual case, the specifications will be de- 
signed so that the new test form will match recent pre- 
vious forms in these respects. 

For GMAT and other continuing testing programs, 
number of items in each separately timed section in re- 
lation to the time limits is regularly monitored for each 
new test form. Although s^^veral indicators are used for 
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this purpose, the percentage answering the last item 
may serve to illustrate the principle. As a general guide* 
line, it is considered that, if about 80% of the exam- 
inees attempt the last item, the time limits and number 
of items are reasonably consistent with each other. 
The results for this indicator for the two most recent 
forms for which results are available are as follows: 

% Reaching Last uem 



Section 


Form A 


Form B 


Reading Comprehension 


81.4 


74.8 


Problem Solving 


20.5 


39 2 


Practical Business Judgment 


88.2 


83.0 


Data Sufficiency 


77.3 


70.1 


Usage 


61.4 


64.3 


Practical Business Judgment 


83.6 


90.6 



These results suggest that Usage and, to a greater ex- 
tent, Problem Solving may include a few more items 
within the allotted time than would be optimal. Partic- 
ularly ♦or Problem Solving, however, the results may be 
affected by the tendency of some examinees not to 
tackle the more difficult items even when they have 
time to do so. To the extent that this occurs, the per- 
<,centage attempting the last item cannot be regarded 
^s a satisfactory indicator of the degree to which time 
allowed and number of items are suitably matched. 

If a new item type is to be introduced, it is necessary 
to make a judgment concerning the number of items 
that can be completed by the great majority of exam- 
inees in the allotted time. This judgment is guided by 
experience with the item type in pretest studies or in 
other testing programs. It is also necessary to judge 
the appropriate level and range of item difficulties so 
that the new test will be appropriate for GMAT takers. 

Once specifications are set, the tasks of selecting a 
set of items that will fulfill the desired specifications 
and of arranging the items in a suitable manner can be 
performed. Items are often arranged in an ascending 
order of difficulty, but other considerations such as 
grouping similar items may be given priority in deter- 
mining the arrangement of items. Finally, as an essen- 
tial test development step, suitable directions to the 
examinee must be provided. 

The draft test is reviewed to insure that editorial and 
printing layoutVules have been followed and that errors 
in the Item or the designation of the correct answers 
have not been introduced. Each separately timed part 
IS reviewed with respect to content balance and the 
test as a whole is reviewed from the viewpoint of how 
wonnen and ethnic minorities are presented in items 
that refer to individuals or groups. 



III. ADMINISTERING THE TEST 

Maintaining Uniform Testing Conditions 

Adrninistermg the test under uniform conditions is es- 
sential if scores earned by different examinees are to 



be strictly comparable. It is especially important that 
the lime limits for each separately timed test section 
be uniform for all examinees, that the test directions be 
fully understood, and that distractions be held to a min- 
imum. Because an examinee who had access to test 
items before the examination would gain an unfair ad- 
vantage over other examinees, elaborate precautions 
are tBken to keep test booklets secure before, during, 
and after the examinations. Examinees are not per- 
mitted to use any kinds of extraneous materials (e.g., 
dictionaries, calculators, notes) during the test and su- 
pervisors and proctors are cautioned to be alert to pre- 
vent copying. To insure that the person for whom 
scores are reported is actually the person who took the 
test, each examinee is asked to provide positive identi- 
fication and this identification is checked by the super- 
visor or proctor before the examination begins. From a 
logical viewpoint, the goal of insuring that each exam- 
inee is tested under uniform conditions calls for thor- 
ough efforts to preserve test security and to prevent 
copying and impersonations. 

The key to maintaining uniform conditions during the 
testing sessions is the selection of supervisors who 
have good judgment and who take a highly responsibia 
attitude toward administering the tests. In addition, the 
GMAT Supervisor's Manual provides specific inform< - 
tion on the many detailed tasks that supervisors nee \ 
to perform. From the time when the examinees have 
been seated until the examinees are dismissed, each 
statement to be made in conducting the test is speci- 
fied by the manual and is read verbatim by the person 
administering the test in a particular room. Finally, the 
manual includes a brief form on which the supervisor is 
asked to report any significant irregularities affecting 
individual candidates (e.g., illness, defective test mate- 
rials) or affecting a group of candidates (e.g., mistim- 
ing). These Supervisor's Irregularity Reports identify 
any significant deviations from uniform testing condi- 
tions. Each reported deviation is given to an appropri- 
ate ETS staff member for action, Ordinan!y» the action 
is based on guidelines or procedures established for 
handling various difficulties. For example, if a supervi- 
sor reports suspected copying, the irregularity is re- 
ferred to the staff group concerned with test security. 
Again, if the supervisor discovers that a mistiming has 
occurred, the report is evaluated by a GMAT program 
direction staff member, possibly in consultation with 
test statisticans, to determine whether a special test 
administration may be needed, or whether some other 
solution is appropriate. 

Because any breach of test security involves a risk 
that some examinees will gam an unfair advantage, the 
care that is taken within ETS and by the companies re- 
sponsible for printing the tests to protect the security 
of the tests is an essential part of maintaining uniform 
test conditions. After the tests have been adminis- 
tered, further steps for detecting copying or impersona- 
tion may be performed, based on analyses of answer 
patterns or handwriting comparisons These proce- 
dures are followed if a school questions the scores 
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"darned by one of its applicants or if a person repeating 
the test has shown an exceptionally large score gain. 
The procedures followed have been carefully designed 
both to protect the examinee whose sco'^e Is bona fide 
and to^^vofd reporting a score that is not a fair repre* 
sentation of the examinee's ability because he or she 
has copied or has been impersonated. 



Preparing the Examinee for the Test 

Over and above the need for maintaining uniform con- 
ditions at the test administration, an obligation has 
been accepted to provide examinees with information 
about the test and how to approach it. In this way, the 
possible advantage of more sophisticated test-tal<ers 
shoujd be minimized. Consequently, the Bulletin of In- 
formation supplied to prospective examinees goes 
beyond providing the necessary information about the 
mechanics of registration and procedures for dealing 
with exceptional conditions that may arise, in addition, 
it provides a carefully-prepared set of sample items de- 
signed to ^ive examinees a realistic idea of the kinds of 
items that they may expect to encounter on the actual 
examination. Moreover, there is a discussion of what 
GMAT measures and of the role it plays in admission 
to graduate management schools. This discussion 
should be helpful, for example, in reassuring exam* 
inees who have an exaggerated idea of the importance 
of test scores in admissions. 

In recent years, the Council has; published a com- 
plete sample test for use by exammees. The most re- 
cent publication is in the 1979 80 Guide to Graduate 
Management Education. Moreover, both general test- 
taking suggestions and advice on how to deal with 
each item type have been provided. An examinee who 
works through the sample test and who adheres to the 
time limits for each section should have a very good 
idea of the demands that the actual test will make. Be- 
cause an answer key is also provided, and a number of 
the items are discussed, the prospective examinee may 
obtain an idea of how well he or she has done, and may 
be alerted to the need for careful aitention to all infor- 
mation given by an item in choosing an answer. The 
main value of the sample test may well be that working 
through it enables the examinees to adopt a more real- 
istic attitude in approaching the actual test. 

One point of test-taking strategy that has received 
special attention arises from the fact that GMAT scores 
are corrected for guessing; that is, a percentage of the 
number of wrong answers is subtracted from the num- 
ber of right answers. This procedure is designed to dts- 
courage blind guessing. It is important, however, for ex- 
aminees to understand that they should answer ques- 
tions about which they have some information even if 
they are not sure of the correct answer. In order to em- 
phasize this point, the person administering the exami- 
nation rec.ds the foilowing statement to the examinees: 

"Although you have already read instructions 
about guessing, they are very important, and I 



have been asked to summarize them before you 
begin the test. Your GMAT scores will be based on 
the number of questions you answer correctly 
minus a fraction of the number you answer incor- 
rectly. Therefore, it is unlikely that mere guessing 
will improve your scores significantly, and it does 
take time. However, if you have some knowledge 
of a question and can eliminate at least one of the 
answer choices as wrong, your chance of getting 
the right answer is improved, and it will be to your 
advantage to answer the question. If you know 
nothing at all about a particular question, it is 
probably better to skip it " 



IV. FACiUTATiNG SCORE iNTERPRETATION 
Defining the Score Scales 

An important step in initiating a new testing program is 
the definition of the score scale in terms of which test 
performance is reported. The definition of the scale is 
particularly important when, as in GMAT, there is a con- 
tinuing program of building new test forms cornposed 
mainly or entirely of new test items and yielding scores 
that are interchangeable with scores on earlier forms 
of the test. Any change in the definition of the score 
scale in a continuins program-conflicts with strict com- 
parability of scoros from one test form to another and 
thus introduces confusion and possible error. 

Several widely-used tests (College Board Scholastic 
Aptitude Test, U>w School Admission Test, Graduate 
Record Examinations Aptitude Test) have scales so de- 
fined that some reasonably appropriate group of exam- 
inees has a mean score of 500 and a standard deviation 
of scores of 100. A further element in the definition may 
prescribe that no reported score can be higher than 800 
or lower than 200. The QjMAT scale was so defined that 
it had a mean of 500 and a standard deviation of 100 for 
the base group and a possible range from 200 to 800. In 
establishing the GMAT scale, the base group that was 
used included all examinees tested in February, May, 
and August, 1954. In effect, this choice of the reference 
group assumed that the ability level of examinees in 
future years would not change so drastically as to re- 
quire a revision of the definition of the scale. Many 
graduate management schools find the score range 
650-800 important in admissions decisions and a sub- 
stantial number of e)|,^minees score in the 200 to 350 
range. There has been no apparent need to expand or 
contract the range of possible scores. Thus, the defini- 
tion of the scale on the basis of 1954 examinees re- 
mains satisfactory. Of course, the scale value of 500 
has^not represented the average performance of exam- 
inees for many years. The current average score, now 
about 460, can be found only by consulting descriptive 
statistics on current examinees^ 

The choice of the 200 to 800 scale has the advantage 
that scores on GMAT canno» be confused with IQ's, 
percentage grades, or percentile ranks. It is true, how- 
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ever, that the use of the same numerical scale for 
GMAT as for other widely-used 'admissions tests may 
occasionally cause confusion If it is thought, for exam* 
pie, that a 500 on GMAT has the sanriePmeaning as a 500 
on the^Law School Admission Test. Both because the 
various tests measure somewhat different abilities and 
because the examinee groups used 1n defining the 
score scales were different for the different tests, this 
assumption of connparability is clearly unwarranted. 

The score scales for the Verbal and Quantitative 
tests were established on the basis of the same group 
used for defining the total score scale. However, the 
score scales for these tests were set so that the base 
group would have a mean of 30 and a standard devia* 
tion of 8 for these scores. As part of the definition, it 
was decided that no scaled score for Verbal or Quarrti- 
tative could be lower than 0 or higher than 60. The use 
of different units minimizes the risk that the part scores 
will be confused with the total scores and serves as a 
reminder to users that these tests were designed to 
supplement rather than to replace the total score. 

Score Equating 

In a program that offers testt, at several administra- 
tions each year, and gives different test forms at each 
administration, a procedure that will permit scores on 
the different test forms t^ be used interchangeably is 
essential. Only in this way can admis^^ion officers con- 
fidently assume that scores earned at different test ad- 
ministrations are comparable. 

If two different forms of the same test are to yield in- 
terchangeable scores, it is important that the composi- 
tion of the two tests be as similar as possible. Even 
modest changes in the weight given to various abilities 
result in some loss of strict comparability. This fact 
does not preclude changes in the test but it does em* 
phasize the need for considering changes carefully be- 
fore introducing them. It is also highly desirable that 
the difficulty level of items in the new form be matched 
as closely as possible with the difficulty level of items 
in the previous form. It is this use of item difficulties in 
test construction that makes the precise determination 
of item difficulty indexes, discussed earlier in this re- 
port, so important. 

Assuming that the new test form has been carefully 
matched with previous forms with respect to abilities 
measured and difficulty level, scQre equating com- 
pletes the process of making scores fully interchange- 
able between the new form and previous forms. 

The method of score equating described in this re- 
port was introduced in 1962, and has served as the 
basic method for score equating in GMAT since that 
time. It is expected that extensive modifications in 
GMAT equating procedures will be required as the re- 
sult of legislation enacted by New York State requiring 
disclosure of test questions following each test admin- 
istration. This section, accordingly, should be consid- 
ered as describing how score comparability was main- 
tained during the period from 1962 to the present time. 



This nriethod of equating call's for administering each 
new form with an old form so that the groups taking 
each form are substantially equal in ability. In the sim- 
plest application of this method, the old form and new 
form are alternated in each package of testj^ooks. 
When the number of examinees' tapping eaph form is 
large, it can safely be assumed that ihis process will 
produce groups that are closely^^m^hed in ability 
level. Because the old form has already^een equated, 
it is possible to calculate the mean and standard devia- 
tion bf reported scores on GMAT To. ^.1 for the group tak- 
ing it. Then, equating can be done by determining an 
equation that, when applied to raw scores on the new 
form, will yield a mean and standard deviation equal to 
the mean and standard deviation of reported scores on 
GMAT Total for the group taking the old form. The same 
procedure is used for equating Verbal and Quantitative 
tecores for the new test form. 

' The development of the linear equation relating raw 
scores to reported scores on the new form can be de- 
scribed briefly. For the old form, the equation relating 
raw scores (Xq) to reported scores calls for multiplying 
the raw score by a constant (Aq) and adding a constant 
(Bo). We want to determine values of A^ and B^ for the 
new form so that the reported score for the* new forrtK 
and the old form will have the same mean and standard 
deviation for the two equating groups. 

If the standard deviations of reported scores for the 
two equating groups are to be made equal, we may 
write: 

where on and oq are the raw score standard deviations 
of the new and old forms respectively. Then, 



Suppose, for example, that the new form has a larger 
standard deviation than the old form. Then, A^will be 
proportionately larger than Aq, and equating will com- 
pensate for this difference between the two test forms. 

Simi^ly, if the mean reported scores for the two 
equating groups are to be equal. 

A^Mn + Bn = AqMo + Bo , 

where M^ and Mq are the mean raw scores for the new 
and old forms, respectively, so that 

Bn = AoMo+Bo-AnMn. 

Thus the new multiplier (A^) and the new additive con- 
stant {B^) may readily be determined using the mean 
and standard deviation of raw scores on the new and 
old forms and the Aoand Bq values for the old form. 

A simplified example of the way this equation works 
can be developed if we suppose that Aq is equal, to A^. 
Then, If the new form is easier than the old foiVn, its 
mean raw score will be higher than the mean ravrtf'score 
foi the old form, so that M^ will be larger than f^Q. Be- 
cause the product of A^ with M^ will be larger than the 
product of Aq with Mq. application of the equation will 
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ifiake smaller than Bq. Thus, if standard deviations 
are'equal, equating'compensates for an easier test by 
reducing the size of the additive constant. Of course, 
under realistic conditions, the relative size of Bo and B^ 
will depend on both the relative size of Ao and An and 
on the relative size of Mq and M^. 

In some instances, thfee forms are packaged iri a se- 
quence, making it possible to equate one new form to 
tv^o old forms and thus to check on the precision of. 
equating, or to equate two new forms to one old form. A 
chart showing the linkages Involved In maintaining the 
GMAT score scale is stiown in Figure 3. It will be noted 
that each form shown has been linkod directly or indi- 
rectly to the first form used in the program. 

The basic eqgating method used for GMAT requires, 
as a practical matter, that both the old and the new 
forms be administered in the same testing room. On 
the rare occasions when this condition cannot be ful- 
filled, more complex procedures utilizing common 
items are necessary. A comprehensive discussion of 
equating methods can be found lo Angoff (1971). Ap- 
pendix B of Ttils report provides a sample of the calcu- 
lations involved in determining the linear equation re- 
lating raw scores to scaled scores. 

^/ 

Validity Studies: Purpose and Background 

From the beginning of the program, an innportant basis 
for score interpretation has been provided by validity 
studies. These studies provide objective evidence on 
the extent to which scores predict subsequent perfor- 
mance. These studies have generally used first year 
average grades as the measure of academic achieve- 
ment and test scores and undergraduate average 
grades as predictors. Because undergraduate grades 
have a long history of acceptance as one factor in ad- 
missions, the relative validity of the test scores and 
previous grades'' is a point of special interest. A closely 
related question is the extent to which the use of test 
scores along with previous grades results in more ef- 
fective prediction. 

The first validity studies were initiated in 195$|, ^ 
soon as the students fested in 1954 had earned firsV 
year grades (Olsen, *1957). A second series of studies 
was initiated in 1958 as part of an effort to evaluate 
possible new item types for inclusion in the test (Pit- 
Cher, 1960). In 1963, all schools represented on the 
Council were invited to participate in validity studies, 
and 19 did so. During the three-year period 1967 
through 1969-70, 67 graduate schools participated in a 
comprehensive program in which the study reports 
were supplemented by regional seminars at which ad- 
missions methods as well as study results were dis- 
cussed (Pitcher, 1972). For a number of years subse- 
quent to thts major effort, the responsibility for con- 
ducting validity studies rested solely with the schools 
that use the tests. Recently, a validity study service has 
been mstttuted. The new service emphasizes flexibility 
by facilitatmg the use of additional predictors beyond 
test scores and undergraduate average grades, of addi- 
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Figure 3. Qthealogical chart showing linkagas 
batwaan GMAT forma. 




tional measures of success beycyd first-year grades, 
and of subgroups as well as the total student group. A 
manual for users of the service has been prepared by 
Powers and Evans (1977). This manual is included as 
one section of the G/iMC Handbook. In 1977-78. 10 
schools participated in a pilot study of the new service 
(Powers and Evans, 1973), and during 1978-79, 25 
schools participate^ in the service. Nearly all partici- 
pating schools have taken advantage of the options 
provided by the new system. 

Validity Studies: Methods 

Because validity study results are customarily ex- 
pressed in terms of correlation coefficients, a brief dis- 
cussion of this index should be useful. Figure 4 pre- 
sentsfaraphically the relationship between test scores 
and grades for a group of students. Consideration of 
the plotted points reveals a clear upward trend. The 
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correlation coefficient provides a widely-accepted, ob- 
jective method of summarizing a set of data of this 
kind. Essentially, the coefficient measures how closely 
a straight line fitted to t^edata describes all the points. 
When the trend is upward from left to right, the coeffi- 
cient is given a plus sign: when the trend is downward, 
the coefficient is given a ninus sign. If all the points 
fall on the line, the correlation is perfect and the coeffi- 
cient is 1.0. When there is no upward or downward 
trend in the line, the coefficient is zero. Validity data 
generally show a clear upward trend with the plotting 
points scatteoed above and below the trend line, as in 
Figure 4. The correlation coefficient provides a conve- 
nient way of summarizing a complicated set of data in 
the form of a single number. Although many other ways 
of analyzing validity data arc useful for various pur- 
poses, none has approached the correlation coeffi- 
cient in general acceptance. 



Figure 4. How the line of relation summarizes the main trend of 
the refationshrp between test scores and grades. 
(Each dot represents one student.) 

Tne correlation coefficient for these (fata is about .50. 
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VViH»n rf»sults of validity Studies are being consicJ 
ff»'il. tho quesuon arisps as to jUst how close a rela 
tion.ship fi particular correlation represents. Table 1 has 
tHM»ri pffparod to provide a partial answer to this kind 
o* (iufM.tion in preparing this table, the group was di- 
vi:U»ii into top fjfth. middle three-fifths and low fifth on 
thp preclif:tof and on the measure of success. Then the 
p.rob;jhiIity that students standing at each level on the 
\)i(\Uc.\oi will attain each level on the measure of suc- 
cess wa?. (letormmea using published tables of prob- 
<it)iMtu7s for the bivanate normal distribution (Schrader. 

For ^^xample. with a correlation of 50. a student 
i'^ ihf* t(u> fifth on the predictor has 44 chances m 100 of 
s.-..rin(; .n th(» top fifth and Only 4 chances m 100 of 



scoring in the bottom fifth on the measure of success. 
Consideration of the data presented in Table 1 provides 
a reasonable estimate of the strength of relationship 
represented by different levels of correlation coeffi- 
cients. 

Because the combined effectiveness of two or more 
predictors is often the primary concern in validity stud- 
ies, the multiple correlation coefficient, which ex* 
presses the correlation between the measure of suc- 
cess and the best-weighted total of scores on two or 
more predictors, is a valuable tool. For this purpose the 
weights are determined statistically so that the correla- 
tion will be as high as possible for the set of data being 
analyzed. Thus, the multiple correlation coefficient of 
graduate school first-year grades with a combination of 
GMAT scores and undergraduate grades can be com- 
pared with the corresponding correlation coefficient 
based on undergraduate grades only. 

Table 1 

Relatton Between Standing on Predictor and Standing on 
Cri terion for Various V al ues of the Correlation Coefficient 

Correlation Standirgon ^ Per Cent o#Stud»ntf Standing 

Coefficient Predictor " In Each Critefion Group 
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Bottom Fifth 


24 
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Top Fifth 


13 


59 


28 
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/4 8 


25 2 


{0 002) 



Validity Studies: Results 

The primary concern in this sunrjmary of validity study 
results Will be with the validity of undergraduate aver- 



13 



age grades used alone and with the validity of an opti- 
mally weighted total of undergraduate average grades, 
GMAT Verbal scores, and GMAT Quantitative scores. 
Median correlation coefficients for five studies bearing 
on this question will be shown, as follows: 

(a) 1963-64 results for 17 schools that participated in 
both the 1963-64 studies and the 1967-70 studies; 

(b) 1967-70 results for the same 17 schools; 

(c) 1967-70 results for 69 studies conducted for 67 par- 
ticipating schools; 

(d) 1977-78 results for 10 schools participating in the 
pilot study of the new Validity Study Service; and 

(e) 1978-79 results for 25 participating schools. 

In order to enhance the connparability of results be- 
tween the earlier and nnore recent studies, the medians 
of the multiple correlation coefficients were calculated 
for the earlier studies, using results pu.t;)lished in Pit- 
cher's (1972) survey of pre-1972 studies. 

Table 2 shows the results for the five comparisons. 
Perhaps the most striking finding is the fact that the 
multiple correlation obtained by the use of three pre- 
dictors (undergraduate average grades, GMAT-Verbal, 
and GMAT-Quantitative) is substantially larger than the 
correlation of undergraduate average grades used 
alone. The increment in validity attributable to GMAT 
scores ranges from .16 for the 67 schools in the 1967-70 
studies to .22 for the 10 schools in the 1977-78 studies. 
Except for the first two sets of coefficients shown in 
Table 2. the interpretation of the comparisons is ob- 
scured by the fact that the schools represented in the 
medians differ from one set of studies to another. The 
importance of this point is supported by the fact that 
1963-64 and 1967-70 studies show virtually identical re- 
sults when the comparison is limited to schools that 
participated m both studies. Under these circum- 
stances, it IS difficult to evaluate possible trends in the 
results, particularly because the median validity of un^ 
dergraduate average grades ranges only from .23 to .29 
and the multiple correlation ranges only from .39 to .48 
m the five sets of studies. 

Table 2 

Median Validity Coefficients for Undergraduate Average 
Grades Separately and in Combination with GMAT Test Scores 
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Content and Construct Validity 

Tests that are used extensively in admissions have tra- 
ditionally been evaluated in terms of their effective- 
ness in predicting academic perform nee. At the same 
time, the abilities tested have been juayod to be rele- 
vant to \te kind of tasks that graduate management 
students are called upon to perform. Considerations of 
this kind have been implicit in the selection of item 
types to be tried out for possible inclusion and in deci- 
sions about what Item types to introduce into GMAT. 
The survey of 19 graduate schools by Casserly and 
Campbell (1973) represented a more systematic effort 
to approxinnate a job analysis of graduate study. Their 
survey provided strong support for the relevance of the 
verbal and quantitative abilities measured by GMAT. 
Their finding that written English was innportant con- 
tributed to the decision to include the Usage section In 
current fornns of GMAT. 

In recent years the concept of construct validity hab 
received increasing attention from testers. Construct 
validation calls for systematic efforts to find out what a 
test measures, studying the relation of test perfor- 
inance to other variables, and the development and 
testing of tentative theories that account for the ob- 
served results (Cronbach, 1971). The well-established 
principle that GMAT scores and undergraduate average 
grades supplement each other in the prediction of aca- 
demic performance shows how much can be gained by 
using different measures jointly rather than in isola- 
tion. Studies of such factors as age, sex, ethnic group 
membership, undergraduate major field, and previous 
business experience may be regarded as steps toward 
developing construct validity. Also relevant are the 
long-range prediction studies by Harrell (i9t)9) and by 
Crooks and Campbell'(1974) and studies of factors af- 
fecting test peformance such as the speededness 
study done by Evans and Reilly (1972). Although con- 
struct validation presents formidable tasks, it repre- 
sents a promising approach to better use of test 
scores. 

Reliability of Test Scores 

Because GMAT scores often play a significant role in 
decisions about individuals, high standards of reliabili- 
ty for these scores have been maintained since the pro- 
gram was begun. The importance of reliability for score 
interpretation arises from the fact that it measures the 
consistency of individual scores from one test form to 
another. Unless the test scores are highly reliable, an 
individuars relative standing, and hence his or her 
score, would show excessive fluctuations. 

The logic of reliability is perhaps most readily under- 
stood if it is thought of as the correlation coefficient 
between scores on two forms of the same test. If the 
*yvo forms are closely matched with respect to the abil- 
ities that they measure, and if they include a reason- 
ably large number of questions, we would expect each 
examinee s relative standing to be quite similar on the 
two forms. Thus we would expect that, if we adminis- 
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tered both forms to each member of a large group of ex« 
3minees and calculated the correlation coefficient be* 
tween the scores on the two forms, the resulting reli* 
ability coefficient would be relatively high. Rather than 
calculating reliability coefficients by this relatively 
cumbersome procedure, it is customary to use certain 
theoretical developments to estimate what the correla* 
tion would be between scores on a test form and a sim* 
ilar form. A more detailed account of the procedures 
used in calculating reliability coefficients is given in 
Appendix B. 

The reliability coefficients for four recent forms of 
GMATare as follows: 





Raliabillty Coaff Icient of: 


Form 


QMAT Total 


Verbal 


Quantitative 


A 


.92 


.90 


.86 


B 


.92 


.89 


.87 


C 


.92 


.90 


.88 


D 


.93 


90 


.88 



These coefficients are based on a cross section of 
examinees at regular administrations of GMAT. The re- 
liability coefficients for GMAT Total are, as would be ex* 
pected, somewhat higher than those for the two part 
scores. However, the part scores may be considered to 
have acceptable reliability, particularly if both are 
taken into account in making decisions about individ- 
uals. The extent of relationship represented by a corre* 
lation of .90 is shown in Table 1. 

Standard Error of Measurement 

Although the reliability coefficient is useful for judging 
whether a test yields sufficiently reliable scores to war* 
rant using the scores as one element in impoi idnt deci- 
sions, the standard error of measurement is more use* 
ful in judging how much individual scores would fluctu- 
ate from one form to another. 

If a person could take a large number of forms of the 
same test, we may assume that his or her scores on 
these forms would follow a normal distribution with a 
standard deviation equal to the standard error of mea- 
surement. The mean score on all the forms is called the 
individual's true score. Thus, if the GMAT Total score 
has a standard error of measurement of 30 and if an ex- 
aminee has a true score of 500, we would estimate that 
approximately two-thirds of his or her observed scores 
would fall between 470 and 530 and 95 percent would 
fall between 440 and 560. There is no way of knowing, 
of course, whether the person's observed score on a 
particular occasion is higher or lower than his or her 
true score. The main value of the standard error of mea- 
surement is in providing some idea of how much varia- 
tion in observed scores we would expect to find if a per- 
son took different fornr.s of the same test. 

The Size of the standard error of measurement of 
scores on GMAT for four recent forms was as follows; 





Standard Error of Measurement of: 


Form 


Total Score 


Verbal 


Quantitative 


A 


29 


3 


3 


B 


29 


3 


3 


C 


30 


3 


3 


D 


28 


3 


3 1 



Standard Error of a Score Difference 

When an examinee repeats a test, a number of factors, 
including the effect of practice, growth in ability during 
the interval between tests, and differences in motiva* 
tion and anxiety may affect the differences in scores. It 
is possible, however, to estimate the probability that 
various score differences are attributable to standard 
error of meiasurement. For this purpose we need to 
know the standard error of the difference. If the two 
tests have equal standard errors of measurement, the 
standard error of the difference is simply the standard 
error of measurement times Thus, if the standard 
error of measurement is 30, the standard error of the 
difference of two scores would be 42, Assuming that 
the differences are normally distributed, we could con* 
elude that, if the person*s true score remains the same, 
about two*thirds of the differences would be 42 points 
or less and 95 percent would be 84 points or less. The 
same reasoning can be followed in estimating the like* 
iihood of various score differences for persons having 
the same true scores. 

Table 3 



Percentages of Candidates Tested from November 1975 
through July 1978 Who Scored below 
Selected Total Test Scores 



Score 


Percentage below 


700 


99 


875 


98 


650 


97 


625 


94 


600 


90 


575 


85 


550 


79 


525 


71 


500 


63 


475 


53 


450 


44 


425 


36 


400 


2R 


375 


21 


350 


16 


325 


11 


300 


08 


275 


OS 


250 


03 


225 


02 


Number of Candidates 


457,103- 


Mean 


461 


Standard Deviation 


107 



•Canrtidates included were sei^ selected 



16 



The fact that the standard error of measurement and 
the standard error of the difference can be computed 
for test scores offers some aid m interpreting test 
scores, by indicating the limitations of scores even 
when tests are professionally constructed and accu- 
rately scored. Although standard errors of measure- 
ment are not available for non-test measures such as 
undergraduate ayerage grades, it is plausible that they, 
too, would vary i^^he person had attended a different 
college, followed ^different program of courses, or 
even had different ir^tructors. In summary, although 
an individual's test scores are facts, inferences about 
the individual based on these facts are more realistic if 
account is taken of the fact that the reliability of the 
scores is less than perfect. 

Descriptive Statistics on Test Scores 

Although 500 was the mean score for examinees in 
1954, the mean of all GMAT takers for November 1975 
through June 1978 is 461. Thus, an examinee or admis- 
sions officer whc wishes to know ilow well a score 
compares *Mth those earned by the whole group of ex- 
aminees r.eeds information on the distribution of 
scores for current applicants to graduate management 
schools who take GMAT. This kind of information is 
provided in the Guide to the Use of GMAT Scores 
prepared for admissions officers and admissions com- 
mittees and in GMAT Candidate Score Interpretation 
GutQO. The most recent table for GMAT Total from the 
1978 79 Guide to the Use of GMAT Scores is shown in 
Table 3. Users of this information are reminded that the 
group includes only those prospective graduate stu- 
dents of management who take the GMAT, and is thus 
self-selected. From the viewpoint of examinees, these 
descriptive statistics provide a rough idea of how well 
they have performed on the test. Because graduate 
management schools differ substantially in the score 
level of applicants that they attract and of students 
whom they enroll, examinees are urged, in the Bulletin 
of Information, to talk with a placement or counseling 
officer m their undergraduate college. By considering 
the student's test scores in relation to his or her col- 
lege record, and by drawing on experience with the 
success of students with various credentials in gaining 
admission to various graduate management schools, 
the placement or counseling officer can often provide a 
more meaningful interpretation of the scores than is 
provided by national statistics. 

Biographical Information for GMAT Examinees 

In recent years. GMAT examinees have been asked to 
answer nine biographical data questions. For most of 
these questions, their answers are transmitted as part 
of the GMAT report to schools that they designate. Two 
questions, concerned with self-reported language flu- 
ency ar^d with population subgroup membership, are 
asked solely for research purposes, and the responses 
given by an mcjivtdual to those questions are never re- 



ported. Results for a few questions will be reported 
here because they help to describe the total group of 
examinees. Unless otherwise noted, the results are 
based on 457,730 questionnaires completed in 1975-76, 
1976-77, and 1977-78. (Because test repeaters com- 
pleted a questionnaire each time they were tested, the 
number of individuals included in the sample is ap- 
preciably less than the number of questionnaires 
analyzed.) 

Table 4 

Representation of Undergraduate Maior Fields 
in QMAT Examinee Group, 197S-1978 

Major Field Percent of Examinees 



Business and Commerce (39.9%) 



Accouniing 


1 W.9 


Managemeni 




MarKniiny 




Finance 




ft ft ^ I. M A ^ C ^ 1 1 ^ ^ T 1 ^\ 0S 

Dusiness caucaiion 




Industrial Relations 


0.5 


Hotel Administration 


0.2 


Other Business and Commerce 


5.3 


oOviai science /o; 




Economics 


ft n 


nsycnoioyy 


3 7 


Political Science 




History 


34 


Education 


2.0 


Sociology 


1.9 


Government 


0.5 


Other Social Science 


2.1 


Science <21. 5%) 




Engineering 


10.3 


Mathematics 


30 


Biological Sciences 


29 


Chemistry 


15 


Computer Science 


0.9 


Physics 


06 


Architecture 


0.4 


Statistics 


02 


Other Science 


1.7 


Humanities (7.1%) 




English 


30 


Foreign Language 


1 7 


Fine Arts 


1 0 


Philosophy 


09 


Other Humanities 


0.5 



Other Major (6.2%) 

Graduate students in nnanagement are drawn from a 
wide spectrum of undergraduate major fields, as 
shown in Table 4. Roughly two-fifths of the examinees 
majored in Business and Commerce, about one-fourth 
majored in Social Sciences, and over one-fifth majored 
in Science. When major fields are considered sepa- 
rately. Accounting (13.9%), Engineering (10.3%), Man- 
agement (9.3%). and Economics (8.0%) are most heav- 
ily represented in the examinee group. Indeed, these 
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four fields account for more than two-fifths of aii exam- 
inees. Another background item of special Interest 
concerns the number of months of full-time work expe* 
rience reported by the examinees. Results for the total 
group may be summarized as follows: 



No response or zero months 

1-12 months 
13-24 months 
25-60 months 
61 or more months 



16.7% 
19.9% 
14.6% 
23.9% 
24.9% 



In this group nearly one-half (48.8%) had more than two 
years of fuii-tlme work experience and nearly one- 
Tables 

Representation of Mat« and Female U.S. Citizens who 
Reported Membership in Various Population 
Subgroups In the QMAJ Examinee Qroup^ 1W7S-78 



Population Subgroup 



Percentage of 
Examinee Qroup 



American Indian (0.3%) 
Mate 
Female 

Black/Negro/AfrO' American (6.5%) 

Male 
Female 

Caucasian/White (87.4 Vo) 
Male 
Female 

Mexrcan American/Chicano (0.7%) 
Male 
Female 

Oriental/Asjan(3.0%) 
Male 
Female 

Puerto Rican (0.4%) 
Ma^e 
Female 

Other (1.7%) 
Male 
Female 



0.2 
0.1 

3.8 
2.7 

64.1 
23.3 

0.6 
0.1 

2.1 
0.9 

03 
0.1 

1.3 
0,4 



fourth (24.9%) had more than five years of work experi- 
ence. 

Many QMAT examinees plan to pursue their graduate 
studies on a part-time basis, in the 1975-78 group, 
46.9% reported that they planned to enroll fuil-time. 
41.1% reported that the/ planned to enroll part-time, 
and 12.0% were undecided. 

Representation of various population subgroups in 
the 1975-78 examinee group is shown in Table 5. U is a 
matter of some interest that the six population groups 
other than Caucasian/White constitute 12.6%, or about 
one-eighth, of the examinees who are United States 
citizens and who reported their population group mem- 
bership. 

A separate tabulation of men and women, based on 
all but two members of the total 1975-78 sample, 
showed that 74.3% were male and 25.7% were female. 
(These results differ slightly from the totals for males 
and females for the sample on which Table 5 was 
based.) Evidence from other tabulations shows a rising 
perce?fttage of female examinees: for the 1973-75 group, 
the percentage was 18.2 and for the 1978-79 group it 
was 31 .5. 

The great majority of GMAT examinees takes the ex- 
amination after college graduation. A tabulation of re- 
sponses for 1977-78 examinees showed the following 
distribution of years of graduation (or expected gradua- 
tion): 



1979 
1978 
1973-77 
1972 or earlier 



2.7% 
25.4% 
49.6% 
22.3% 



In this group, well over two-thirds (71.9%) of examinees 
were tested after they had completed college, and 
more than one-fifth (22.3%) were tested more than five 
years after completing college. 



Ot \ht* mt.ii qroijp nf px,»ifuf^ee5 15 reported lhat moY were nn\ U S Cil«/eas 
4 9 ' *»'i;«.f u-i Ui.»y Uii not care to respnno to the question on oopuiaiion mem 
r««fshii; .m l 4 8' 'liO ot.! fKspond to thp Question 
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APPENDIX A 
Sample Items For Item Types Used In GMAT 



This appendix includes examples of each item type 
that has been included in GMAT since the test was first 
administered in 1954. For each item type, the month 
and year of the administration at which it was intro- 
duced is stated. For item types not currently in use, the 
month and year of the first administration at which it 
was replaced is stated. 

For the item typo designated as Verbal, three sub- 
types (Analogies, Antonyms, and Sentence Comple- 
tion) are shown. For the Quantitative item type, two 
subtypes (Data Interpretation and Problem Solving) are 
shown. 

Characteristically, item types that involve reading a 
passage or interpreting a tabular or graphic presenta- 
tion include five multiple choice items. For most sam- 
ples given in this appendix, however, only one of the 
multiple choice items is shown. 

The correct response for each sample item is shown, 
either by starring the correct response or, in a few in- 
stances, by showing the correct response in paren- 
theses. 

Verbal 

Introduced: 2/54 
Replaced. 2/61 
Restored: 2/66 
Replaced: 10/76 

Analogies 

Directions: In each of the following questions, a related 
pair of words or phrases Is followed by five lettered 
pairs of words or phrases. Select the lettered pair 
which best expresses a relationship similar to that ex- 
pressed m the original pair. 

Astronomy : Astrology 

(A) chemistry : alchemy* (B) biology : botany 

(C) religion : mythology (D) geography : geology 
(E) medicine : magic 

Antonyms 

Directions: Each question below consists of a word 
printed in capital letters, followed by five words or 
phrases lettered A through E. Choose the lettered word 
or phrase which is most nearly opposite in meaning to 
the word m capital letters. 

Since some of the questions require you to distin- 
guish fine shades of meaning, be sure to consider ail 
the choices before deciding which one is best. 

DOUR: (A) blithe* (B) talkative (C) inflexible 
(D) nest (E) modish 



Sentence completion 

Directions: Each of the sentences below has one or 
more blank spaces, each blank indicating that a word 
has been omitted. Beneath the sentence are five let- 
tered words or sets of words. You are to choose the one 
word or set of words which, when inserted in the sen* 
tence, best fits in with the meaning of'the sentence as 
a whole. 

The manufacture of cupboards and doors, bathtubs and cook- 
ing stoves, taking place as it does In factories, should t>e unaf* 

fected by ; but since the articles are parts of buildings 

and there Is no demand for them uniess buildings ars flolng 
up, they too are in activity. 

(A) price . . sluggish (B) cost . . expensive 
(C) weather .. seasonal* (D) methodology .. regulated 
(E) policies . .unstable 

Quantitative 

Introduced: 2/54 
(Currently in use) 

Date Interpretation 

Directions: In this section solve each problem, using 
any available space on the page for scratch work. Then 
indicate the one correct answer in the appropriate 
space on the answer sheet. 



Per Cent of the Total Value of U.S. Lend-lease 


Supplies Received by U.S. Allies 




1st Year 


2nd Year 


Britain 68Vo 


38 Vo 


Russia 5*Vb 


30% 


All others 27% 


32% 


Total Value of Supplies 


'8 


(in billions of dollars) 2 



What per cent of the total value of lend^ease supplies for both 
years was received by Russia and Britain combined? 

(A) 31 (B) 44 (C) 69* (0) 70.5 (E) 141 

Problem Solving 

If the length of a rectangle Is Increased by 10 per cent and the 
width by 40 per cent, by what per cent is the area increased? 

(A) 4 (B) 15.4 (C) 50 (D) 54* (E) 400 

Best Arguments 

Introduced: 2/54 
Replaced: 11/61 
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Directions: The questions in this part are based on sit- 
uations which involve some sort of dispute or disagree- 
ment. In most of the questions you will be asked to 
evaluate the arguments which might be offered by the 
disputants; some questions will require you to analyze 
the situations in other ways. You are to assume that 
these disputes are being brought before an intelligent 
lay arbitrator (not a court of law) for decision; the ques- 
tions, therefore, will not involve any legal precedents or 
technicalities. You are to evaluate the situations objec- 
tively m terms of ordinary concepts of reasonableness 
and fair play and base your answers on a logical anal- 
ysis of the facts and arguments as they are presented 
to you. 

Bruce Bond, a broker, on« morning overhears a famous fi* 
nancier say, ''The price of American Beartrap stock will go 
sky-high within two weeks.*' Later that day Pete Good* 
fellow, an old friend to whom Bond owes many favors, 
calls on Bond to ask for advice about investments* He em- 
phasizes that he wants to buy some stock on which he 
can make money quickly because he Is in a tight financial 
spot. Bond says that American Beartrap is the best buy h<f 
knows at the moment. When Goodfellow protests that he 
has never heard of American Beartrap, Bonds replies that 
the basis of his recommendation is information received 
from a reliable source. Goodfellow accepts the advice and 
invests heavily. Within two weeks American Beartrap 
stock has become virtually worthless, Goodfellow's entire 
investment is lost, and Goodfellow is ruined financially. 
Goodfellow thereupon accuses Bond of causing his finan- 
cial downfall. 

Which one of the following arguments best supports Goodfel- 
low's accusation? 

(A) Bond should not have presumed to give Goodfellow any 
advice. 

<B) Goodfellow naturally believed that Bond wanted to help 
him. 

(C) Bond had misrepresented his knowledge of the situation.* 

(D) Bond should have cautioned Goodfellow not to invest too 

heavily in the stock. 

(E) Bond had taken advantage of Goodfellow's obvious lack of 

knowledge of financial matters. 

Quantitative Reading 

introduced: 2/54 
Replaced. 11/61 

There have been many suggestions that in an emergen- 
cy the professional schools, particularly medical 
schools, accelerate their programs, thus graduating 
more irained men and women. If more doctors are to be 
trained we must have more of the three essentials of 
such a process— teachers, students, and equipment— 
or we must utilize those which we have to greater ef- 
fect. But objections have been made to asking stu- 
dents and faculty to work through the four quarters of 
the year, and the plan herewith submitted, recognising 
these objections, attempts instead a fuller utilization of 
the third essential, eouipment and supplies^ to realize 
ef loctively the objectives of an accelerated prijgram. 
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The proposed plan, which is essentially the more fre- 
quent admission of freshman classes, is designed for 
those schools which operate on the quarter, as op- 
posed to the semester, plan. Following this plan such a 
school could graduate four classes in three years by 
the admission of a freshman class every nine months. 

An illustration '.from the table below may serve to 
clarify the proposal. In accordance with the plan, one 
class, indicated on the table by the letter A, would en- 
ter medical school as freshmen in the Summer Quarter 
of 1951, continue in school through three consecutive 
quarters, and go on vacation during the Spring Quarter 
of 1952. Students in class B would enter as freshmen in 
the Spring Quarter of 1952, continue in school through 
the Summer and Autumn Quarters of 1952, and go on 
vacation during the Winter Quarter, at which time stu- 
dents in class C would begin their freshman year. As 
can be seen from the table, there would always be a 
freshman, sophomore, junior, and senior class in 
school. 



Quarter Plan Organization for Class Acceleration 





Summer 


Autumn 


Winter 


Spring 


1951-52 


A, 


AjX4YyZ,o 


A,X,Y,2,. 


B,XfY,Z,j 


1952-53 




A,B,X,Y„ 


AeCtXflY,, 


BiCjXjYij 


1953 54 


A,B,C, 


A,B.D,X,. 


A^C^DjXn 


ByC^DjXij 


1954-55 


A.oBAE, 


A„B,D,E, 


AoC,D,E3 


B,oC.D,F, 


1955-56 


B„C,E4F, 


B„D,E»F, 


C,oD.E.G, 


C„D,F,G, 


1966-57 




D,.E,F.H, 


D„E,G,H, 




1957-58 


E,oF«G.I, 


6,,F,Hjl, 


E„G,HJ3 


F,oG«HjJ, 


1958-59 




F ijHfl^Jj 


G.qH J„K, 


GmHjJ^Kj 



Classes X, Y, and Z are included in the plan 

(A) as the second, third, and fourth new classes 
(8) as convenient symbols to take up the lapse before the ad* 
mission cf 8 

(C) as illustrations of the classes which work through four 

quarters a year 

(D) to show how classes already formed fit into the new plan* 

(E) to indicate the necessity for a summer recess 

Organization of Ideas 

Introduced: 11/61 
Replaced: 11/66 

Directions: Each set of questions in this section con- 
sists of a number of statements. Most of these state- 
ments refer to the same subject or idea. The state- 
ments can be classified as follows: 

(A) the central idea to which most of the statements are re- 
lated; 

(B) main supporting ideas, which are general points directly 
related to the central idea; 

(C) illustrative facts or detailed statements, which document 
the main supporting Idea; 

(0) statements irrelevant to the central idea. 
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Tht sentence* do not make up one complete paragraph. They 
may be regarded as the components ol a sentence^utllne lor 
a brief essay. The outline might, for example, have the follow* 
ing form: 

A contains the central idea 
B contains a main supporting ide« 

C presents en Illustrative fact 
B contains a main supporting l^a 

C presents an Illustrative f acf! 

C presents an illustrative f 4Ct 

Classify each of the foliating sentences in accordance with 
the system described above. 

(8) The Roman road< connected all parts of the empir# with 
Rome. 

<B) The Roman road» were so well built that some of thfrm re- 
maiiUpday. 

(A) One the greatest achievements of the Romans vias their 

extensive and dtirable system of roads. 
(D) Wealthy travelers in Roman times used horse-drawn 

coaches. 

(C) Along Roman roads caravans would bring to Rome luxu- 
ries from Alexandria and the East. 

Directed Memory— Reading Recall 

Introduced: 11/61 
Replaced: 10/77 

Directions: In the test you wHI be given a period of time 
for the study of several extended prose passages. 
Then, without looking backiat the passages, you will 
answer questions based on their contents. The follow- 
ing exercise is much shorter lhan those appearing on 
the test, but it illustrates the gdneral nature of the pas- 
sages and the questions. Remember, though, that on 
the test you will not be allowed; to refer back, to the pas- 
sages. 

SAMPLt: PASSAGE: 

Soon after the First World War began, public attention 
was concentrated on the spectacular activities of the sub- 
marine, and the question was raised Ofiore pointedly than 
ever whether or not the day of the battleship had ended. 
Naval men conceded the importance of the U-t>oat and 
recognUed the need for defense agaimst it, but they stilt 
placed their confidence in big guns and big ships. The 
German naval victory at Coronal, off Cfiile, and the British 
victories at the Falkland Islands and in the North Sea con- 
vinced the experts that fortune stilt favored superior guns 
(even though speed played an important part in these bat* 
ties); and, as long as British dreadnoughts kept the Ger- 
man High Seas Fleet immoblllfed, the battleship re* 
mained in the eyes of naval men ttie key to navat power. 

Public attention was focused on the submarine t>ecause 

(A) it had immobilized the German High Seas Fleet 
(6) it had played a major role in the British victories at the 
Falkland Islands and in the North Sea 

(C) It had taken the place of the battleship 

(D) of its spectacular activities* 

(E) of its superior speed 
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Data Sufficiency 

Introduced. 11/61 
(currently in use) 

Directions: Each of the questions below is followed by 
two Statements labeled (1) and (2), in which certain data 
are given. In these questions you do not actually have 
to compute an answer, but rather you have to decide 
whether the data given in the statements are sufficient 
for answering the question. Using the data given in the 
statements plus your knowledge of mathematics and 
everyday facts (such as the number of days in July), 
you are to blacken space 

A If statement (1) ALONE is sufficient, but statement (2) 
alone Is not sufficient to answer the question asked; 

B If statement (2/ ALONE is sufficient, but statement (1) 
alone Is not sufficient to aqf wer the question asked; 

C If BOTH statements (1) and (2) TOGETHER are sulfh 
dent to answer the question asked, but NEITHER 
statement ALONE is sufficient; 

D If EACH statement ALONE Is sufficient to answer the 
question asked; 

E if statements <1) and (2) TOGETHER are NOT sufficient 
to answer the question asked, and additional data spe- 
cific to the problem are needed. 

(E) In a four*votume work, what Is the weight of the third vol* 
ume? 

<1) The fourvolume work weighs 5 pounds. 

(2) The first three volumes together weigh 6 pounds. 

Practical Business Judgment: 

Introduced; 11/72 
(currently in use) 

Directions: The passage in this section ts followed by 
two sets of questions, data evaluation and data appli* 
cation, tn the first set, data evaluation, you will be re* 
quired to classify certain of the facts presented m the 
passage' on the basis of their importance, as illustrated 
in the following example. 

(This sample passage is much shorter than passages 
appearing in the test* but it is representative of data 
evaluation material,) 

SAMPLE PASSAGE 

Fred North, a prospering hardware dealer in Hillidale. 
Connecticut, felt that he needed more store space to ac- 
commodate a new line of farm equipment and repair parts 
that^e intended to carry. A number of New York City com* 
muteM^had recently purchased tracts of land in the en- 
virons 4^f Hlllidale and there had taken up farming on a 
small scale. Mr. North, foreseeing a potential increase In 
farming In that area, wanted to expand^is business to 
cater to this market. North felt that the most feasible and 
appealing recourse open to him would be to purchase the 
adjoining store owned by Mike Johnson, who used the 
premises for his small grocery store. Johnson's business 
had been on the decline for over a year since the advent of 
a large supermarket in the town. North felt that Johnson 




would b« willing to Ml! the property at reasonable terms, 
and this was Important aince North, after the purchase of 
the new merchandlae, would have tittle capital available 
to Invest in the expanalon of his store. 

Consider each item separately In terms of the passage and 
choose 

A If the item is a MAJOR OBJECTIVE in making the decision, 
that is, one of the outcomes or results sought by the deci* 
sion-maken 

B if the item is a MAJOR FACTOR In making the decision, 
that is, a consideration, explicitly mentioned in the pass* 
age, that Is basic In determining the decision; 

C it the item is a MINOR FACTOR in making the decision, that 
is, a secondary consideration that affects the criteria tan* 
-gentially, relating to a Major Factor rather than to an Objec* 
tive; 

D if the Item is a MAJOR ASSUMPTION in making the deci* 
sion, that is, a supposition or projection made by the deci* 
slon maker before weighing the variables; 

E if the item is an UNIMPORTANT ISSUE in making the deci- 
sion, that is, a factor that is insignificant or not immediate* 
ly relevant to the situation. 

SAMPLE DATA EVALUATION QUESTIONS 

(D) Increase in farming in the Hillidale area 

(A) Acquisition of property for expanding store 

(B) Cost of Johnson's property 

(C) State of Johnson's grocery business 

(E) Quality of the farm equipment North Intends to rell 

» 

A second set of questions, data application, requires 
judgments based on a comparison of the available al- 
ternatives in terms of the relevant criteria, in order to 
attain the objectives stated in the passage. 

Each of the following questions relates to the pas- 
sage. For each question, choose the best answer 

SAMPLE DATA APPLICATION QUESTIONS 

I Potential demand for farm equipment in the Hillidale area 

II Desire to undermine Mike Johnson's business 

III Higher profit margin on farm equipment than on hardwc'e 
goods 

(A) ioniy' (B) III only (C) I and II only 

(D) II and til only (E) Ml. and III 

Usage 

Introduced: 10/76 
(currently in use) 

Directions: The following sentences contain problems 
m grammar, usage, diction (choice ot words), and 
idioni Some sentences are correct. No sentence con- 
tains more than one error. 

You will find that the error, if there is one. is underlin- 
ed and lettered. Assume that all other elements of the 
sentence are correct and cannot be changed In choos- 
ing answers, follow the requirements of standard writ- 
ten English 



lif there is an error, select the one iync/er//nec/ par/ that 
must be changed In order to make the sentence cor- 
rect, and blacken the corresponding space on the 
answer sheet. 

If there is no error, mark answer space E. 

(C) He spoke bluntly and angrily to we spectators. No error 
A B C D E 

(A) He works every day so that he would become financially in* 
A~ B C D 

dependent In his old age. No error 

E 

Reading Comprehension 

introduced: 10/77 
(currently in use) 

Directions: Each passage in this group is followed by 
questions based on its content. After reading a pas- 
sage, choose the best answer to each question and 
blacken the corresponding space on the answer sheet. 
Answer all questions following a passage on the basis 
of what is stated or implied in that passage. 

One sample reading passage follows. (It is much 
shorter than passages in the test, but it illustrates their 
general nature.) 

SAMPLE PASSAGE 

Not until the mid*1960*s did any agriculturally based 
unions in the Southwest show promise of sustained op- 
eration. Mexican Americans were involved in efforts dur* 
ing the 1920*s and I930*s to establish farm labor unions, 
but although these efforts resulted in dramatic and par- 
tially successful strikes, they were episodic and without 
organizational continuity. 

The migratory work pattern compounded the problem 
of labor organization. A dispersed population in motion is 
not an easy target for organizational appeals. Industrial 
unionism had the advantage of mobilizing workers who 
were more concentrated In workplace and residence. It 
was easier to organize a work force whose members filed 
in and out of the workplace at fixed locations and at fixed 
times aitd who were exposed to daily contact with orga- 
nizers. In addition, the multiple employer structure of 
farm work, partially a function of labor mobility, made It 
less likely that union gains in one area could be trans* 
ferred to another. 

QUESTION ON READING PASSAGE 

According to the passage, when did the first efforts of Mexi- 
can Americans to form agricultural unions take place? 

(A) At the turn of the century 

(B) During the 1920*s^ 

(C) During the 1930*s 

(D) Immediately after the Second World War 

(E) During the 1960*5 
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APPENDIX B 

Calculating Reliability Coefficients and Equating Parameters 



Calculating Reliability Coeff iceints 

For the relatively unspeeded homogeneous subtests of 
the type that constitute GMAT, the reliability coefficient 
for each subtest can readily be calculated. Ail neces- 
sary data for these calculations are readily obtained 
when the test is given at a regular test administration. 
Once the reliability of each subtest has been deter* 
mined, the reliability of GMAT Total, Verbal, and Quanti- 
tative scores can also be determined by well estab- 
lished methods. 

The method used for calculating subtest reliabilities 
is based on the assumption that the intercorrelations 
of all Items included in the subtest are substantially 
uniform, as would be expected to happen if all items 
are measuring the same basic ability. For example, if 
each item on a reading comprehension test were corre- 
lated with each other item, the whole set of intercorre- 
lations would be expected to be essentially uniform. 
Under th9Se conditions, convenient formulas for calcu- 
lating the reliability coefficient can be derived. 

The particular formula used in calculating reliability 
coefficients for GMAT subtests (called Kuder-Richard- 
son Formula No. 20) has been adapted for use with 
formula-scored tests. To calculate the reliability of a 
GMAT part, the following items of information are 
needed: 

(a) the number of items (called **n"). 

(b) the number of right answers for each item (called 

(c) the number of wrong answers for each item (called 

(d) the fraction by which answers are multiplied before 
subtracting from the number of right answers in cal- 
culating formula scores(called **k'*). 

(e) the standard deviation of formula scores (called 
"ot"), and 

(f) the number of examinees in the sample (called 

In describing the process of calculating a reliability 
coefficient, formulas for using the item analysis results 
may be considered first. In these formulas, the symbol 
• X" means that the results are to be summed over all 
items. The formulas are: 



A:^ 
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right answers for all items and multiplying the results 
by the number of examinees. From that result, we sub- 
tract the sum of the squares of the number of right 
answers for all items. The resulting number is then 
divided by the.squareof the number of examinees. 

The values calculated by these three formulas are 
then substituted in the following formula: 

Reliability =^ ^1 - A ± k»B^-f 2kC ^ 

The following numerical sample illustrates the calcu- 
lation of the reliability of the Reading Comprehension 
subtest. Because the test is composed of 25 five- 
choice items, n is equal to 25. and k is equal to .25. The 
basic data for the calculations are obtained from the 
computer printout, as follows: 

N= 2.080 
ZR = 31,456 r- 
SR' = 42,480.21 2 
XW= 15.961 
IW'= 12,427.015 
Z:RW= 17,731,905 
o, = 5.3847 

From these figures, we can compute values for A. B, 
C, as follows: 

(2.080) ( 31.456) -42,480.212 _ ^ ^^^2 
(2,080)' 

_ (2. 080) (15.961)- 12.427 015 ^ 4 3012 
(2.080)' 



C = 



1 7,751.905 
(2.080)' 



4.0985 . 



It may be useful to express the first formula m words 
In calculatmg A. we begin by summing the number of 



The reliability for the Reading Comprehension part may 
now be calculated, as follows: 

„, 25 r 5.3042 ♦(.25)M4.801 2) *2(.25) (4.0985)"! 
Reliability =2^[1 ^-3^-^- J. 

The resulting value for the reliability of this part is .767. 
Similar calculations provide reliability coefficients for 
each of the other parts. 

Once the reliability of each part has been calculated. 
It becomes possible to calculate the reliability of the 
total score. For this purpose, we need the standard er- 
ror of measurement of each part. The standard error of 
measurement can be defined by the following formula: 

Standard Error of Measurement = o \/l - reliability. 

It can be shown that the reliability of the total score 
equals: 

. Surp of squared staridard er_rors of (pBasLuemen^^^ 
^Squared stanciard deviation of total sco~re 
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Table B1 illustrates the calculation of the reliability 
of the Total score for a recent form of GMAT. A similar 
procedure Is followed In calculating the reliability of 
the Verbal and Quantitative part scores. 

Table B1 

Calculation ol Reliability Coefficient of QMAT Total Score 
1. Basic Data 



Standard Standard 
Raliabiiny standard Error of Error of 



Part 


of Pan 


Davlatlon* 


MaaaurarrMnt" 


Meaauram*n( 


Reading 










Compre- 










hension 


.7667 


5.3847 


2.6008 


6.7642 


Problam 










Solving 


.7489 


4.6941 


2,3520 


5.5319 


Practical 










Judgment 


.6219 


3.9185 


2.4094 


5.8052 


Data 










Suffi- 










ciency 


.8029 


6.1773 


2.7428 


7.5230 


Usage 


7626 


5.5644 


2.7111 


7.3501 


Practical 










Judgnnent 


.5889 


3.8463 


2.4663 


6.0826 



Total of Squared Standard Errors of Measurement 39.0570 



• Thi> staridaru deviaiton of total scores is J?i 8836 
' 'Thi^se viiues ware ca>cuidied usmg mor« d«cimat places than are shown m the 
tat le 

II. Calculations 

Relubility - 

1 _ Sum of squared standard errors of estimate 
Squared standard deviation 

Reliability - 

1 . 39 0570 ^ ^ ^ 39.0570 ^ g „ 
(i' 1 8836)' 478.8919 ** 

Calculating Equating Parameters 

The basic method of equating used for GMAT scores 
from 1962 to the present time makes use of the well 
known statistical principle that if a very large group is 
divided at random into two or more subgroups, the re- 
sultmg subgroups will be quite similar in every charac 
terisiic. In applying this method we administer a form 
for which scaled scores are already known for one ran- 
dom subgroup and we administer the new form to a dif- 
ff>f<^nt random subgroup. Assuming that the two ran- 
do> subgroups are equal in the ability measured by 



QMAT, we attribute differences In scores between the 
two forms to a difference In the difficulty of the two 
tasks. 

In order to relate raw scores on a new form to the 
QMAT scale, we need to know: 

(a) the equation relating raw scores to scaled scores 
for the old form; 

(b) the mean and standard deviation of raw scores for 
the random, subgroup of examinees who took the 
old form; and 

(c) the mean and standard deviation of raw scores for 
the random subgroup of examinees who took the 
new form. 

The following data were available for 9,850 exam- 
inees who took the old form and 9,795 examinees who 
took the new form: 

Old Form New Form 

Mean 65.765279 63.586932 

Stanard Deviation 22.795999 23.601557 

Equating files show that the equation for converting 
raw scores to scaled scores for the old form has a mul- 
tiplier (Aq) of 4.69 and an additive constant (Bq) of 
167.0300. 

To solve for the multiplier for the new form, we use 
the formula 

(^) 

A -^ft'^ftQ / 22.795999 \ 
^--^^^^^ [ ^3.601657 ) 

A^,=r 4.478635: 

This value agrees with the computer determined 
value of 4.4786. 

To solve for the additive constant for the new form, 
we use the formula-* 

Bn = AoMo- Af^M^ + Bo. 
so that 

B^ = (4.6369) (65.765279) - (4.478635) (63.586932) + 
167.0300 

= 304.9470- 284.7827+ 167,0300 
= 187.1943. 

As it turned out. the value obtained when B^^ was calcu- 
lated by the computer was also 187.1943. although the 
sample calculation did not reproduce in detail the cal- 
culations performed by the computer. 
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